-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bzip2, lz4, and zstd independent compression filter packages #880
Add bzip2, lz4, and zstd independent compression filter packages #880
Conversation
All tests are now passing, including nightly. |
src/filters/H5Zbzip2/Project.toml
Outdated
@@ -0,0 +1,12 @@ | |||
name = "H5Zbzip2" | |||
uuid = "094576f2-1e46-4c84-8e32-c46c042eaaa2" | |||
version = "0.16.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea will be to sync the version with the main HDF5.jl package version?
I don't have a problem with this, but as you probably already know, these 'independent' packages could also just maintain their own version number that get bumped when we modify the subpackages.
I think the main benefit of the current approach is simplicity in bumping version numbers across all packages. OTOH, if we keep an independent version number I think then that reduces precompilation when we make new releases.
From judging changes in the original blosc file, changes to the filter packages are likely to be infrequent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can only tag the whole repository, so we might as well keep it in sync with the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can tag a sub-package associated with a subdirectory of a repository via @JuliaRegistrator register subdir=...
. Makie e.g., uses this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is a tag and release from the git perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep all the subpackages at version 0.1.0
and tag them individually. I think the benefit is we avoid precompilation of the subpackages after new HDF5.jl releases, and most of these filter packages won't be changing frequently. I believe that makes more sense and follows semver, instead of bumping the submodules in instances were HDF5.jl was changed but the subpackages were untouched.
For releases I've been using the registrator bot. It looks like it should be straightforward to accommodate subpackages with the registrator from @thchr comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can tell the registrator bot whatever version we want, but the question is how to do you mark these versions in Git? Do you plan on making an individual tag for each sub-package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point now, since these are not git submodules...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this again, does the git tag version even matter? No.
So, what'll happen is that the git tag version will be synced with the HDF5.jl version as is currently the case.
The git tag version will not agree with the tagged version of the filter subpackages, which will have their own independent versions as specified in their own Project.toml. This is the case since we aren't using git submodules, which is more trouble than it's worth.
To me this makes sense and is desired behavior since we won't have to tag all the filter packages superfluously every time a change is made in the main HDF5.jl module that doesn't modify the filter submodules.
src/filters/H5Zblosc/src/H5Zblosc.jl
Outdated
# We need to preserve the output of cbuffer_sizes under Julia 1.8 (as of 2021/12/11) | ||
# Otherwise their Refs get freed and the buffer becomes corrupted | ||
# If this fails, consider copying `in` or wrapping that array. | ||
GC.@preserve outbuf_size cbytes blocksize begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this line, H5Zblosc appears to error with Julia 1.8 in development.
cbuffer_sizes
calls blosc_cbuffer_sizes
which returns a size values via pointers / Refs
.
function cbuffer_sizes(buf)
s1 = Ref{Csize_t}()
s2 = Ref{Csize_t}()
s3 = Ref{Csize_t}()
ccall((:blosc_cbuffer_sizes,libblosc), Cvoid,
(Ptr{UInt8}, Ref{Csize_t}, Ref{Csize_t}, Ref{Csize_t}),
buf, s1, s2, s3)
return (s1[], s2[], s3[])
end
Somehow it appears that the Refs
are getting garbage collected and the value of outbuf_size
is corrupted. This does not make sense to me, but this seems to have worked fine until Julia 1.8, so perhaps there is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been resolved in JuliaLang/julia#43408 (pending merge)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, so this is unrelated to JuliaIO/Blosc.jl#86 or does the fix also require making the changes according to your comment in JuliaIO/Blosc.jl#86 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the underlying fix has been merged. I propose we remove mitigating patch here as it won't be an issue henceforth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can tag a commit as H5Zblosc_v0.1.0 when you make a release. The main convenience of keeping the versions somewhat aligned is making it easy to know what versions are compatible with what other versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. OTOH, I think it might not be too difficult to determine if a non-backwards compatible change is made to the filter subpackages. At worst, we could always bump the compatible HDF5 version to the latest one in the filter subpackage every time a change is made to the subpackage or HDF5.jl methods that relate to the filter.
test/external.jl
Outdated
@@ -50,5 +54,6 @@ close(source_file) | |||
|
|||
rm(fn1) | |||
# rm(fn2) | |||
@debug "external tests did not delete" fn2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot delete fn2
or run those above tests because some HDF5 handle is still open. Because of the external link, HDF5 is set to use "weak" closing, so closing the file is not sufficient. Ultimately h5close
at the end of runtests.jl does close the files and allows for the tempfile to be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I have an idea on how to fix this.
Could we split this off from this PR?
Ideally I'd like to squash-merge all the new filter functionality in a single commit/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the debug message
Any chance you could rebase with master. |
I had this comment here: https://github.com/JuliaIO/HDF5.jl/pull/880/files#r767927849 |
Co-authored-by: Mustafa M <[email protected]>
Co-authored-by: Mustafa M <[email protected]>
f796447
to
e344882
Compare
These two pull requests should fix the remaining downstream errors. JuliaIO/MAT.jl#170 (fixes SparseMatrixCSC errors for Julia 1.7) Anything else? You are welcome to make commits directly. |
You'll have to enable "allow commits" in the PR since I don't have access. |
I invited you to the repository. I guess it's trickier since it's coming from an organizational account. |
All tests should be passing now. https://github.com/JuliaIO/HDF5.jl/runs/4530943064?check_suite_focus=true |
Co-authored-by: Mustafa M <[email protected]>
src/filters/filters.jl
Outdated
macro dev_embedded_filters() | ||
quote | ||
@eval begin | ||
using Pkg | ||
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zblosc"))) | ||
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zbzip2"))) | ||
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zlz4"))) | ||
Pkg.develop(PackageSpec(; path=joinpath(dirname(pathof(HDF5)), "filters", "H5Zzstd"))) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the macro which devs the filter packages into the test environment.
Looks like there may be an aws issue downloading the non ubuntu binaries? |
@mkitti any chance you could review my changes. |
I think we need to in a follow up PR or this PR, either: (1) deprecate setting (2) Alternatively, we could add hooks for :zstd, :lz4, :bzip2 like we do for the blosc package @require H5Zblosc="c8ec2601-a99c-407f-b158-e79c03c2f5f7" @eval begin
set_blosc!(p::Properties, val::Bool) = val && push!(Filters.FilterPipeline(p), H5Zblosc.BloscFilter())
set_blosc!(p::Properties, level::Integer) = push!(Filters.FilterPipeline(p), H5Zblosc.BloscFilter(level=level))
end @simonbyrne maybe you also have an opinion on which option is more desirable from a ux perspective. |
I generally think that we should deprecate the older syntax Rather than this Lines 438 to 452 in 00dcb13
an extensible system would work as follows # Defined in HDF5.jl
function class_setproperty!(::Type{DatasetCreateProperties}, p::Properties, name::Symbol, val)
class_setproperty!(::Type{DatasetCreateProperties}, p, Val(name), val)
end
# Defined in H5Zblosc.jl
import HDF5: class_setproperty!
class_setproperty!(::Type{DatasetCreateProperties}, p::Properties, Val{:blosc}, val) = set_blosc!(p, val) That said, I think the |
filter_path = joinpath(dirname(pathof(HDF5)), "filters") | ||
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zblosc"))) | ||
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zbzip2"))) | ||
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zlz4"))) | ||
Pkg.develop(PackageSpec(path=joinpath(filter_path, "H5Zzstd"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did like the idea of having this in some kind of callable closure. I anticipate downstream projects may want to do something similar via CI. Perhaps though it could it its own file under test/
?
Overall the changes look fine to me. We just need to register the packages for some of the tests to work. 👍 |
It will probably be easier to iterate and test this once the subpackages are registered. I suggest registering the subpackages well before we register HDF5 0.16 itself. Then at least we can |
Sure, is dev'ing not working in some cases for you ? I've just manually kept calling
when need be... In any case, we'll have to register the subpackages first. I think I'll merge this PR now and we then need to iterate on the new filters interface. I do agree that the second approach is much more flexible, so we'd have to appropriately deprecate existing calls. |
Thank you very much @mkitti for your effort and patience with these PRs! I'm really excited for these changes in the upcoming v0.16 release. Also sorry to drag out some of the licensing issues. |
I've submitted registration for all the subpackages, they will take a day or two since they are blocked until a manual merge is done, as they depend on v0.16 which is yet released. |
We could ask for manual merge if needed. Perhaps it might be good to reach out so others know what we are doing. |
This branch splits the compression plugin code into independent subdirectory packages. Consider merging this into #875.
The main purpose of this pull request is to test CI.